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Abstract — We consider large MIMO systems, where by '■large' 
we mean number of transmit and receive antennas of the order 
of tens to hundreds. Such large MIMO systems will be of 
immense interest because of the very high spectral efficiencies 
?H . possible in such systems. We present a low-complexity detector 
Oh which achieves uncoded near-exponential diversity performance 
■^T for hundreds of antennas (i.e., achieves near SISO AWGN 
performance in a large MIMO fading environment) with an 
t"^ average per-bit complexity of just 0(N t N r ), where N t and TV,. 
denote the number of transmit and receive antennas, respectively. 
With an outer turbo code, the proposed detector achieves good 
coded bit error performance as well. For example, in a 600 
. , transmit and 600 receive antennas V-BLAST system with a 
^ high spectral efficiency of 200 bps/Hz (using BPSK and rate- 
i ^ i 1/3 turbo code), our simulation results show that the proposed 
detector performs close to within about 4.6 dB from theoretical 
,_^ , capacity. We also adopt the proposed detector for the low- 
^. ■ complexity decoding of high-rate non-orthogonal space-time 
f^ ' block codes (STBC) from division algebras (DA). For example, 
qq ' we have decoded the 16 x 16 full-rate non-orthogonal STBC 
^v , from DA using the proposed detector and show that it performs 
f^ . close to within about 5.5 dB of the capacity using 4-QAM 
and rate-3/4 turbo code at a spectral efficiency of 24 bps/Hz. 
The practical feasibility of the proposed high-performance low- 
complexity detector could potentially trigger wide interest in 
the implementation of large MIMO systems. We also illustrate 
the applicability of the proposed detector in the low-complexity 
^ detection of large multicarrier CDMA (MC-CDMA) systems. In 
. ,-h , large MC-CDMA systems with hundreds of users, the proposed 
detector is shown to achieve near single-user performance at an 
5^ ■ average per-bit complexity linear in number of users, which is 
C^ quite appealing for its use in practical CDMA systems. 

Index Terms — Large MIMO systems, V-BLAST, non- 
orthogonal STBCs, low-complexity detection, high spectral ef- 
ficiency, multicarrier CDMA. 



I. Introduction 

MULTIPLE-input multiple-output (MIMO) techniques 
offer transmit diversity and high data rates through 
the use of multiple antennas at both transmitter and receiver 
sides [l]-[6]. A key component of a MIMO system is the 
MIMO detector at the receiver, which, in practice, is often the 
bottleneck for the overall performance and complexity. MIMO 
detectors including sphere decoder and several of its variants 
[7]-[15] achieve near maximum likelihood (ML) performance 
at the cost of high complexity. Other well known detectors 
including ZF (zero forcing), MMSE (minimum mean square 
error), and ZF-SIC (ZF with successive interference cancella- 
tion) detectors [3], [16] are attractive from a complexity view 



point, but achieve relatively poor performance. For example, 
the ZF-SIC detector (i.e., the well known V-BLAST detector 
with ordering [17], [18]) does not achieve the full diversity 
in the system. The MMSE-SIC detector has been shown to 
achieve optimal performance [3]. However, these detectors are 
prohibitively complex for large number of antennas of the 
order of tens to hundreds. With small number of antennas, 
the high capacity potential of MIMO is not fully exploited. 
A key issue with using large number of antennas, however, is 
the high detection complexities involved. 

In this paper, we focus on large MIMO systems, where 
by 'large' we mean number of transmit and receive antennas 
of the order of tens to hundreds. Such large MIMO systems 
will be of immense interest because of the very high specnal 
efficiencies possible in such systems. For example, in a V- 
BLAST system, increased number of ttansmit antennas means 
increased data rate without bandwidth increase. However, 
major bottlenecks in realizing such large MIMO systems 
include i) physical placement of large number of anten- 
nas in communication terminal^ ii) lack of practical low- 
complexity detectors for such large systems, and Hi) channel 
estimation issues. In this paper, we address the second problem 
in the above (i.e., low-complexity large MIMO detection). 
Specifically, we present a low-complexity detector for large 
MIMO systems, including V-BLAST as well as high-rate non- 
orthogonal space-time block codes (STBC) [29]. 

The proposed low-complexity detector has its roots in past 
work on Hopfield neural network (HNN) based algorithms 
for image restoration [19], [20], which are meant to handle 
large digital images. HNN based image restoration algorithms 
in [20] are applied to multiuser detection (MUD) in CDMA 
systems on AWGN channels in [21]. This detector, referred 
to as the likelihood ascent search (LAS) detector, essentially 
searches out a sequence of bit vectors with monotonic likeli- 
hood ascent and converges to a fixed point in finite number 
of steps [21]. The power of the LAS detector for CDMA lies 
in i) its linear average per-bit complexity in number of users, 
and ii) its ability to perform very close to ML detector for 
large number of users. Taking the cue from LAS detector's 

1 We, however, point out that there can be several large MIMO applications 
where antenna placement need not be a major issue. An example of such a 
scenario is to provide high-speed back-haul connectivity between base stations 
using large MIMO links, where large number of antennas can be placed at 
the base stations. Also, tens of antennas can be placed in moderately sized 
terminals (e.g., laptops, set top boxes) that can enable interesting spectrally 
efficient, high data rate applications like wireless IPTV. 



complexity and performance superiority in large systems, we, 
in this paper, successfully adopt the LAS detector for large 
MIMO systems and report interesting results. 

We refer to the proposed detector as MF/ZF/MMSE-LAS0 
detector depending on the initial vector used in the algorithm; 
MF-LAS detector uses the matched filter output as the initial 
vector, and ZF-LAS and MMSE-LAS detectors employ ZF and 
MMSE outputs, respectively, as the initial vector. Our major 
findings in this paper are summarized as follows: 

Detection in Large V-BLAST Systems: 

• In an uncoded V-BLAST system with BPSK, the pro- 
posed detector achieves near- exponential diversity for 
hundreds of antennas (i.e., achieves near SISO AWGN 
performance). For example, the proposed detector nearly 
renders a 200 x 200 MIMO fading channel into 200 paral- 
lel, non-interfering SISO AWGN channels. The detector 
achieves this excellent performance with an average per- 
bit complexity of just 0(N t N r ), where N t and N r denote 
the number of transmit and receive antennas, respectively. 

> With an outer turbo code, the proposed detector achieves 
good coded bit error performance as well. For example, 
in a 600 transmit and 600 receive antennas V-BLAST 
system with a high spectral efficiency of 200 bps/Hz 
(using BPSK and rate- 1/3 turbo code), our simulation 
results show that the proposed detector performs close 
to within about 4.6 dB from the theoretical capacity. We 
note that performance with such closeness to capacity has 
not been reported in the literature so far for such large 
number of antennas using a practical complexity detector. 

Detection of Large Full-Rate Non-Orthogonal STBCs: 

• We have adopted the proposed detector for the low- 
complexity decoding of large full-rate non-orthogonal 
STBCs from division algebras (DA) in [29]. We decode 
the 16 x 16 full-rate non-orthogonal STBC from DA 
(which has 256 data symbols in one STBC matrix) using 
the proposed detector and show that it performs close 
to within about 5.5 dB from capacity using 4-QAM and 
rate-3/4 turbo code at a spectral efficiency of 24 bps/Hz. 

• We point out that because of the high complexities 
involved in the decoding of large non-orthogonal STBCs 
using other known detectors, the BER performance of 
such high-rate large non-orthogonal STBCs have not been 
reported in the literature so far. The very fact that we 
could show the simulated BER performance plots (both 
uncoded as well as turbo coded) for a 16 x 16 full- 
rate non-orthogonal STBC with 256 complex symbols 
in one STBC matrix in itself is a clear indication of 
the superior low-complexity attribute of the proposed 
detector. To our knowledge, this is the first time that 
simulated BER plots and nearness to capacity results for 
a full-rate 16 x 16 STBC from DA are reported in the 
literature; this became feasible due to the low-complexity 
attribute of the proposed detector. 

2 Throughout the paper, whenever we write MF/ZF/MMSE-LAS, we mean 
MF-LAS, ZF-LAS, and MMSE-LAS. 



Detection in Large Multicarrier CDMA Systems: 

> We also illustrate the applicability of the proposed de- 
tector in the low-complexity detection of large multicar- 
rier CDMA (MC-CDMA) systems. In large MC-CDMA 
systems with hundreds of users, the proposed detector 
is shown to achieve near single-user performance, at an 
average per-bit complexity linear in number of users, 
which is quite appealing for its use in practical CDMA 
systems. 
The rest of the paper is organized as follows. In section 
HI1 we present the proposed LAS detector for V-BLAST 
systems and its complexity. The simulated uncoded and coded 
BER performance of the proposed detector for V-BLAST is 
presented in section [HI] Decoding of non-orthogonal STBCs 
and BER performance results are presented in section [TV] 
The LAS detector for MC-CDMA and the corresponding BER 
performance results are presented in section [V] Conclusions 
are presented in section [VT] 

II. Proposed LAS Detector for Large MIMO 

In this section, we present the proposed LAS detector for 
V-BLAST and its complexity. Consider a V-BLAST system 
with Nt transmit antennas and N r receive antennas, N t < N r , 
where Nt symbols are transmitted from Nt transmit antennas 
simultaneously. Let bj € { + 1, —1} be the symbol transmitted 
by the jth transmit antenna. Each transmitted symbol goes 
through the wireless channel to arrive at each of N r receive 
antennas. Denote the path gain from transmit antenna j to 
receive antenna k by hkj- Considering a flat-fading MIMO 
channel with rich scattering, the signal received at antenna fc, 
denoted by y^, is given by 



N t 
Vk = J2 hk 3 b 3 



n k . 



(1) 



The {h kj }, Vfc g {l,2,---,7V r }, Vj e {1,2,- •• ,N t }, 
are assumed to be i.i.d. complex Gaussian r.v's (i.e., fade 
amplitudes are Rayleigh distributed) with zero mean and 
£[K) 2 ] = E [( h %) ] = 0-5, where h T kj and h% are the 
real and imaginary parts of hkj- The noise sample at the fcth 
receive antenna, rife, is assumed to be complex Gaussian with 
zero mean, and {rik}, k — 1, 2, ••• ,N r , are assumed to be 
independent with E[n\] — N — NtE ' , where E„ is the 
average energy of the transmitted symbols, and 7 is the average 
received SNR per receive antenna [2]. Collecting the received 
signals from all receive antennas, we write] 



Hb 



(2) 



where y = [j/i 3/2 ■ ■ ■ Vn t ] is the AV-length received signal 
vector, b = [ fei 62 • • • b^ t ] is the ./Vt -length transmitted bit 
vector, H denotes the N r x Nt channel matrix with channel 



3 Although we present the detector for BPSK here, we have adopted it for 
M-QAM/Af-PAM as well. 

4 We adopt the following notation: Vectors are denoted by boldface low- 
ercase letters, and matrices are denoted by boldface uppercase letters. [.] , 
[.]*, and [.] denote transpose, conjugate, and conjugate transpose operations, 
respectively. 5R(.) and SJ(.) denote the real and imaginary parts of the complex 
argument. 



coefficients {hkj}, and n = [m ri2 ••■ "-tv,. ] is the N r - 
length noise vector. H is assumed to be known perfectly at 
the receiver but not at the transmitter. 



A. Proposed LAS Algorithm 

The proposed LAS algorithm essentially searches out a 
sequence of bit vectors until a fixed point is reached; this 
sequence is decided based on an update rule. In the V-BLAST 
system considered, for ML detection [16], the most likely b 
is taken as that b which maximizes 

A(b) = b T U H y + b T (U H y)* b T U H Ub. (3) 

The likelihood function in (0 can be written as 



A(b) - b T y e// - b T H e// b, (4) 



where 



Yeff = 

H e// = 



H"y 
U H H. 



(H*y)* 



(5) 
(6) 



Update Criterion in the Search Procedure: Let b(n) denote 
the bit vector tested by the LAS algorithm in the search step 
n. The starting vector b(0) can be the output vector from any 
known detector. When the output vector of the MF detector 
is taken as the b(0), we call the resulting LAS detector as 
the MF-LAS detector. We define ZF-LAS and MMSE-LAS 
detectors likewise. Given b(n), the algorithm obtains b(n + 
1) through an update rule until a fixed point is reached. The 
update is made in such a way that the change in likelihood 
from step nton+ 1, denoted by AA (b(n)), is positive, i.e., 

AA(b(n)) = A (b(n + 1)) - A (b(n)) > 0. (7) 

An expression for the above change in likelihood can be 
obtained in terms of the gradient of the likelihood function as 
follows. Let g(n) denote the gradient of the likelihood function 
evaluated at b(n), i.e., 



(n) 



A 0(A(b(n))) 
d(b(n)) 



y eff - H reai b(n), (8) 



where 



U rea i = H e // + (H e //)* = 2K(H e //). (9) 

Using (0]i in (0, we can write 

AA(b(n)) - b T (n+l)y e// -b T (n+l)H e// b(n + l) 
-(b T (n)y e// -b T HH e// bH) 
= (b T (n + 1) - b T (n)) (y e// - H reai b(n)) 
- (b T (n + 1) - b T (n)) (-H reoi b(n)) 
- b T {n + l)H e //b(n + 1) + b T (7i)H e// b(n). (10) 



Now, defining 



Ab(n) 



A 



b(n + l)-b(n), 



(11) 



and i) observing that b T (n)H rea ;b(n) = 2b T (n)U e ffb(n), 
ii) adding & subtracting the term ^b T (n)U rea ib(n+l) to the 



RHS of ( [Tol l, and Hi) further observing that b T (n)H rea ;b(r 
1) = b T (n + l)H rea ;b(n), we can simplify ( fT0T > as 



AA(b(n)) = Ab T (n)(y eff -H real b(n)) 
- -Ab T (n)H rea iAb(n) 

= Ab T (n)(gH + iz(n)), (12) 



where 



-ll rea l Ab(n). 



(13) 



Now, given y e ff, H e //, and b(n), the objective is to obtain 
b(n + 1) from b(n) such that AA(b(n)) in (fT2l is positive. 
Potentially any one or several bits in b(n) can be flipped (i.e., 
changed from +1 to -1 or vice versa) to get b(n + 1). We 
refer to the set of bits to be checked for possible flip in a step 
as a check candidate set. Let L(n) C {1, 2, • • • , N t } denote 
the check candidate set at step n. With the above definitions, 
it can be seen that the likelihood change at step n, given by 
(fT2l . can be written as 



AA(b(n))=J2{bj(n- 

j£L(n) 



bj(n) 



9j(n) + -Zj(n) 



,(14) 



where bj(n), gj(n), and zAn) are the jth elements of the 
vectors b(n), g(n), and z(n), respectively. As shown in [21] 
for synchronous CDMA on AWGN, the following update rule 
can be easily shown to achieve monotonic likelihood ascent 
(i.e., AA(b(n)) > if there is at least one bit flip) in the 
V-BLAST system as well. 

LAS Update Algorithm: Given L(n) C 

{1, 2, • • • , N t },\/n > and an initial bit vector 
b(0) € {— 1, +1} , bits in b(n) are updated as per 
the following update rule: 



bj{n- 



1) = 



+ 1, 



-1, 



bj{n), 



if j e L(n), bj{n) = -1 
and gj(n) > tj(n), 
if j e L(n), bj(n) = +1 
and gj(n) < —tj(n), 
otherwise, 



(15) 



where tj(n) is a threshold for the jth bit in the nth step is 
taken to be 



tj(n) - J2 K H 



real)j,i\, Vj £ L(ri), 



(16) 



iei(n) 



where (H- r eal)j,i is the element in the jth row and ith column 
of the matrix H rea ;. 

It is noted that different choices can be made to specify 
the sequence of L(n),\/n > 0. One of the simplest sequences 
correspond to checking one bit in each step for a possible 
flip, which is termed as a sequential LAS (SLAS) algorithm 



with constant threshold, 



h = 



(H, 



en l 



J,] 



The sequence of 



L(n) in SLAS can be such that the indices of bits checked in 
successive steps are chosen circularly or randomly. Checking 
of multiple bits for possible flip is also possible. Let L/(n) C 
L(n) denote the set of indices of the bits flipped according to 



the update rule in ( fT3T > at step n. Then the updated bit vector 

b(n + 1) can be written as 

b(n + l) = b(n)-2 ^ 6t(n)ei, (17) 

where &; is the ith coordinate vector. Using (fTTT i in (0, the 
gradient vector for the next step can be obtained as 

g(n + 1) = y e // - H reaZ b(n + 1) 

= g(n) + 2 ^ h l (n)(U real ) i , (18) 

ieL f (n) 

where (H reo ;). denotes the ith column of the matrix Ureal- 
The LAS algorithm keeps updating the bits in each step based 
on the update rule given in (Tl~5T > until b(n) = b/ p ,Vn > n/ p 
for some n/ p > 0, in which case b/ p is a fixed point, and it is 
taken as the detected bit vector and the algorithm terminates. 

B. Complexity of the Proposed Detector for V-BLAST 

In terms of complexity, given an initial vector, the the LAS 
operation part alone has an average per-bit complexity of 
0(N t N r ). This can be explained as follows. The complexity 
involved in the LAS operation is due to three components: 
i) initial computation of g(0) in ©, ii) update of g(n) in 
each step as per JT8l , and Hi) the average number of steps 
required to reach a fixed point. Computation of g(0) requires 
the computation of H H H for each MIMO fading channel 
realization (see Eqns. (JHJ, (O, and ©), which requires a per- 
bit complexity of order 0(N t N r ). Update of g(n) in the nth 
step as per ( fT8l using sequential LAS requires a complexity of 
0(N t ), and hence a constant per-bit complexity. Regarding the 
complexity component Hi), we obtained the average number 
of steps required to reach a fixed point for sequential LAS 
through simulations. We observed that the average number of 
steps required is linear in N t , i.e., constant per-bit complexity 
where the constant c depends on SNR, Nt, N r , and the initial 
vector (see Fig. [5J, Putting the complexities of i), ii), and 
Hi) in the above together, we see that the average per-bit 
complexity of LAS operation alone is 0(N t N r ). In addition 
to the above, the initial vector generation also contributes to 
the overall complexity. The average per-bit complexity of gen- 
erating initial vectors using MF, ZF, and MMSE are 0(N r ), 
0(N t N r ), and 0(N t N r ), respectively. The higher complexity 
of ZF and MMSE compared to MF is because of the need 
to perform matrix inversion operation in ZF/MMSE. Again, 
putting the complexities of the LAS part and the initial vector 
generation part together, we see that the overall average per- 
bit complexity of the proposed MF/ZF/MMSE-LAS detector 
is 0(N t N r ). This complexity is an order superior compared to 
the well known ZF-SIC detector with orderingj, whose per-bit 
complexity is 0(N^N r ). 

III. LAS Detector Performance in V-BLAST 

In this section, we present the uncoded/coded BER perfor- 
mance of the proposed LAS detector in V-BLAST obtained 

'Henceforth, we use the term 'ZF-SIC to always refer 'ZF-SIC with 
ordering'. 



through simulations, and compare with those of other detec- 
tors. The LAS algorithm used is the sequential LAS with 
circular checking of bits starting from the first antenna bit. 
We also quantify how far is the proposed detector's turbo 
coded BER performance away from the theoretical capacity. 
The SNRs in all the BER performance figures are the average 
received SNR per received antenna, 7, defined in Sec. HT1 T21. 

A. Uncoded BER Performance 

MF/ZF-LAS performs increasingly better than ZF-SIC for 
increasing N t = N r : In Fig. Q] we plot the uncoded BER 
performance of the MF-LAS, ZF-LAS and ZF-SIC detectors 
for V-BLAST as a function of Nt = N r at an average received 
SNR of 20 dB with BPSK. The performance of the MF and 
ZF detectors are also plotted for comparison. From Fig. Q] we 
observe the following: 

. The BER at N t = N r = 1 is nothing but the SISO flat 
Rayleigh fading BER for BPSK, given by | [l - Jj^] 
which is equal to 2.5 x 1CT 3 for 7 = 20 dB [22]. 
While the performance of MF and ZF degrade as Nt = 
N r is increased, the performance of ZF-SIC improves 
for antennas up to Nt = N r = 15, beyond which a 
flooring effect occurs. This improvement is likely due to 
the potential diversity in the ordering (selection) in ZF- 
SIC, whereas the flooring for Nt > 15 is likely due to 
interference being large beyond the cancellation ability 
of the ZF-SIC. 
• The behavior of MF-LAS and ZF-LAS for increasing 
Nt — N r are interesting. Starting with the MF output 
as the initial vector, the MF-LAS always achieves better 
performance than MF. More interestingly, this improved 
performance of MF-LAS compared to that of MF in- 
creases remarkably as Nt = N r increases. For example, 
for Nt — N r — 15, the performance improves by an 
order in BER (i.e., 7.5 x 10~ 2 BER for MF versus 

7 x 10" 3 BER for MF-LAS), whereas for N t = N r = 60 
the performance improves by four-orders in BER (i.e., 

8 x 10" 2 BER for MF versus 9 x 10~ 6 BER for MF- 
LAS). This is due to the large system effect in the LAS 
algorithm which is able to successfully pick up much of 
the diversity possible in the system. This large system 
performance superiority of the LAS is in line with the 
observations/results reported in [21] for a large CDMA 
system (large number of antennas in our case, whereas 
it was large number of users in [21]). 

. While the ZF-LAS performs slightly better than ZF-SIC 
for antennas less than 4, ZF-SIC performs better than 
ZF-LAS for antennas in the range 4 to 24. This is likely 
because, for antennas less than 4, the BER of ZF is small 
enough for the LAS to clean up the ZF initial vector into 
an output vector better than the ZF-SIC output vector. 
However, for antennas in the range of 4 to 24, the BER 
of ZF gets high to an extent that the ZF-LAS is less 
effective in cleaning the initial vector beyond the diversity 
performance achieved by the ZF-SIC. A more interesting 
observation, however, is that for antennas greater than 25, 
the large system effect of ZF-LAS starts showing up. So, 
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Fig. 1 . Uncoded BER performance of MF/ZF-LAS detectors as a function of 
number of transmit/receive antennas (N t = N r ) for V-BLAST at an average 
received SNR = 20 dB. BPSK, Nt bps/Hz spectral efficiency. 
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Fig. 2. Uncoded BER performance of ZF-LAS versus ZF-SIC as a function of 
average received SNR for a 200 X 200 V-BLAST system. BPSK, 200 bps/Hz 
spectral efficiency. ZF-LAS achieves higher order diversity (near-exponential 
diversity) than ZF-SIC at a much lesser complexity. 



in the large system setting (e.g., antennas more than 25 
in Fig. [T), the ZF-LAS performs increasingly better than 
ZF-SIC for increasing Nt — N r . We found the number 
of antennas at which the cross-over between ZF-SIC and 
ZF-LAS occurs to be different for different SNRs. 
Another observation in Fig. Q~|is that for antennas greater 
than 50, MF-LAS performs better than ZF-LAS. This 
behavior can be explained by observing the performance 
comparison between MF and ZF detectors given in the 
same figure. For more than 50 antennas, MF performs 
slightly better than ZF. It is known that ZF detec- 
tor can perform worse than MF detector under high 
noise/interference conditions [16] (here high interference 
due to large Nt). Hence, starting with a better initial 
vector, MF-LAS performs better than ZF-LAS. 



ZF-LAS outperforms ZF-SIC in large V-BLAST systems both 
in complexity & diversity: In Fig. |2j we present an interesting 
comparison of the uncoded BER performance between ZF, 
ZF-LAS and ZF-SIC, as a function of average SNR for a 
200 x 200 V-BLAST system. This system being a large system, 
the ZF-LAS has a huge complexity advantage over ZF-SIC as 
pointed out before in Sec. lH-Bl In fact, although we have taken 
the effort to show the performance of ZF-SIC at such a large 
number of antennas like 200, we had to obtain these simulation 
points for ZF-SIC over days of simulation time, whereas the 
same simulation points for ZF-LAS were obtained in just few 
hours. This is due to the 0(N$N r ) complexity of ZF-SIC 
versus 0(N t N r ) complexity of ZF-LAS, as pointed out in 
Sec. 11-B More interestingly, in addition to this significant 
complexity advantage, ZF-LAS is able to achieve a much 
higher order of diversity (in fact, near-exponential diversity) in 
BER performance compared to ZF-SIC (which achieves only 
a little better than first order diversity). This is clearly evident 
from the slopes of the BER curves of ZF-LAS and ZF-SIC. 
Note that the BER curve for ZF-LAS is almost the same as 
the uncoded BER curve for BPSK on a SLSO AWGN channel, 
given by Q(^fy) [22]. This means that the proposed detector 
nearly renders a 200 x 200 MLMO fading channel into 200 
parallel, non-interfering SLSO AWGN channels. 

LAS Detector's performance with hundreds of antennas: As 
pointed earlier, obtaining ZF-SIC results for more than even 
50 antennas requires very long simulation run times, which is 
not the case with ZF-LAS. In fact, we could easily generate 
BER results for up to 400 antennas for ZF-LAS, which are 
plotted in Fig. [3] The key observations in Fig. [3] are that i) the 
average SNR required to achieve a certain BER performance 
keeps reducing for increasing number of antennas for ZF-LAS, 
and it) increasing the number of antennas results in increased 
orders of diversity achieved (close to SISO AWGN perfor- 
mance for 200 and 400 antennas). We have also observed from 
our simulations that for large number of antennas, the LAS 
algorithm converges to almost the same near-ML performance 
regardless of the initial vector chosen. For example, for the 
case of 200 and 400 antennas in Fig. [3] the BER performance 
achieved by ZF-LAS, MF-LAS, and MMSE-LAS are almost 
the same (although we have not explicitly plotted the BER 
curves for MF-LAS and MMSE-LAS in Fig. |3}. So, in such 
large MIMO systems setting, MF-LAS may be preferred over 
ZF/MMSE-LAS since ZF/MMSE-LAS require matrix inverse 
operation whereas MF-LAS does not. 

Observation i) in the above paragraph is explicitly brought 
out in Fig. [4] where we have plotted the average received 
SNR required to achieve a target uncoded BER of 10~ 3 as 
a function of N t = N r for ZF-LAS and ZF-SIC. It can 
be seen that the SNR required to achieve 10 -3 with ZF- 
LAS significantly reduces for increasingly large Nt — N r . 
For example, the required SNR reduces from about 25 dB 
for a SISO Rayleigh fading channel to about 7 dB for a 
400 x 400 V-BLAST system using ZF-LAS. As we pointed 
out in Fig. [3] this 400 x 400 system performance is almost 
the same as that of a SISO AWGN channel where the SNR 
required to achieve 10~ 3 BER is also close to 7 dB [22], i.e., 
201og(Q- 1 (10- 3 )) w7dB. 
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Fig. 3. Uncoded BER performance of ZF-LAS for V-BLAST as a function 
of average received SNR for increasing values of Nt = N r . BPSK, Nt 
bps/Hz spectral efficiency. For large number of antennas (e.g., Nt = N r = 
200, 400), the performance of ZF-LAS, MF-LAS, and MMSE-LAS are almost 
the same. 
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Fig. 5. Complexity of the LAS algorithm in terms of average number of 
steps per transmit antenna till fixed point is reached in V-BLAST as a function 
of N t = N r for different SNRs and initial vectors (MF, ZF, MMSE). BPSK. 
Results obtained from simulations. 
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Fig. 6. Ergodic capacity for 600 X 600 MIMO system with receive CSI. 



Fig. 4. Average received SNR required to achieve a target uncoded BER 
of 10 -3 in V-BLAST for increasing values of Nt = N r . BPSK. ZF-LAS 
versus ZF-SIC. ZF-LAS achieves near SISO AWGN performance. 



B. Turbo Coded BER Performance 

In this subsection, we present the turbo coded BER per- 
formance of the proposed LAS detector. We also quantify 
how far is the proposed detector's performance away from 
the theoretical capacity. For a N t x N r MIMO system model 
in Sec. [II] with perfect channel state information (CSI) at the 
receiver, the ergodic capacity is given by [5] 

C = E[logdsb(l Nr + ( 1 /N t )HH H )], (19) 

where Ijy r is the N r x N r identity matrix and 7 is the average 
SNR per receive antenna. We have evaluated the capacity in 
(fl9l l for a 600 x 600 MIMO system through Monte-Carlo 
simulations and plotted it as a function of average SNR in 



Fig- IS Figure Q shows the simulated BER performance of the 
proposed LAS detector for a 600 x 600 MIMO system with 
BPSK and rate- 1/3 turbo code (i.e., spectral efficiency = 200 
bps/Hz). Figure [8] shows similar performance plots for rate-3/4 
turbo code at a spectral efficiency of 450 bps/Hz. From the 
capacity curve in Fig. [6] the minimum SNRs required at 200 
bps/Hz and 450 bps/Hz spectral efficiencies are -5.4 dB and 
-0.8 dB, respectively. The following interesting observations 
can be made from Figs. [7] and [8] 

• In terms of uncoded BER, the performance of MF, ZF, 
and MMSE are different, with ZF and MMSE performing 
the worst and best, respectively. But the performance of 
MF-LAS, ZF-LAS, and MMSE-LAS are almost the same 
(near-exponential diversity performance) with the number 
of antennas being large (N t = N r = 600). 

• With a rate- 1/3 turbo code (Fig. 13, all the LAS detec- 
tors considered (i.e., MF-LAS, ZF-LAS, MMSE-LAS) 
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Fig. 7. BER performance of various detectors for rate- 1/3 turbo-encoded 
data using BPSK symbols in a 600 X 600 V-BLAST system. 200 bps/Hz 
spectral efficiency. Proposed MF/ZF/MMSE-LAS detectors' performance is 
away from capacity by 4.6 dB. 
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Fig. 8. BER performance of various detectors for rate-3/4 turbo-encoded 
data using BPSK symbols in a 600 X 600 V-BLAST system. 450 bps/Hz 
spectral efficiency. Proposed MF/ZF/MMSE-LAS detectors' performance is 
away from capacity by 5.6 dB. 



achieve almost the same performance, which is about 4.6 
dB away from capacityo (i.e., near-vertical fall of coded 
BER occurs at about -0.8 dB). Turbo coded MF/MMSE 
without LAS also achieve good performance in this 
case (i.e., less than only 2 dB away from turbo coded 
MF/ZF/MMSE-LAS performance). This is because the 
uncoded BER of MF and MMSE at around to 2 dB 
SNR are small enough for the turbo code to be effective. 
However, this is not the case with turbo coded ZF without 
LAS. As can be seen, in the range of SNRs shown, the 

6 We point out that the turbo coded BER curves shown in Figs. 7 to 1 1 
in [23] have been plotted erroneously with an SNR shift of —10 log r dB, 
where r is the turbo code rate, which amounted to a pessimistic prediction of 
nearness to capacity. Here, we have corrected those plotting errors. Figures 
|7] [8] [9] and the nearness to capacity results given in Table-I in this paper are 
the corrected ones. 
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TABLE I 
Nearness to capacity of various detectors for 600 x 600 

V-BLAST WITH BPSK AND VARIOUS TURBO CODE RATES. PROPOSED 

LAS DETECTOR PERFORMS TO WITHIN ABOUT 4.6 DB, 4.7 DB, 5.6 DB 

FROM CAPACITY FOR 200, 300, AND 450 BPS/HZ SPECTRAL 

EFFICIENCIES, RESPECTIVELY. 

uncoded BER of ZF without LAS is so high (close to 
0.5) that the vertical fall of coded BER can happen only 
at very high SNRs, because of which we have not shown 
the performance of turbo coded ZF without LAS. 

In Table I, we summarize the performance of various 
detectors in terms of their nearness to capacity in a 600 x 600 
V-BLAST system using BPSK, and rate- 1/3, rate- 1/2 and rate- 
3/4 turbo codes. From Table-I, it can be seen that there is 
a clear superiority of the proposed MF/ZF/MMSE-LAS over 
MF/MMSE without LAS in terms of coded BER (nearness to 
capacity) when high-rate turbo codes are used. For example, 
when a rate-3/4 turbo code is used the MF/ZF/MMSE-LAS 
performs to within about 5.6 dB from capacity, whereas the 
performance of rate-3/4 turbo coded MF/MMSE without LAS 
are much farther away from capacity. 

Performance of M-PAM/M-QAM: Although the LAS al- 
gorithm in Sec. [TT] is presented assuming BPSK, it can be 
adopted for Af-ary modulation including A/-PAM and M- 
QAM. In the case of BPSK, the elements of the data vector 
take values from {±1}. Af-PAM symbols take discrete values 
from {A m , 1 < m < M} where A m = (2m - 1 - M), 
m = 1,2, ••• , M, and Af-QAM is nothing but quadrature 
PAM. We have adopted the LAS algorithm for A/-PAM/M- 
QAM and evaluated the performance of the LAS detector 
for 4-PAM/4-QAM and 16-PAM/16-QAM without and with 
coding. In A/-PAM/A/-QAM also, we have observed large 
system behavior of the proposed detector similar to those 
presented for BPSK. As an example, in Fig. [9] we present the 
uncoded and coded performance of the MMSE-LAS detector 
in a 600 x 600 V-BLAST system for 16-PAM/16-QAM with 
rate- 1/2 and rate- 1/3 turbo codes at spectral efficiencies of 
1200 bps/Hz and 800 bps/Hz, respectively. It can be observed 
that the LAS detector achieves performance close to within 
about 13 dB from the theoretical capacity. 

Effect of Channel Estimation Errors: As we pointed out 
earlier, another key issue in large MIMO systems is channel 
estimation [31], [32]. We have evaluated the effect of channel 
estimation errors on the performance of the proposed detector 
in V-BLAST by considering an estimation error model, where 
the estimated channel matrix, H, is taken to be H = H + AH, 
where AH is the estimation error matrix, the entries of which 
are assumed to be i.i.d. complex Gaussian with zero mean 
and variance ai. Our simulation results showed that in a 
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Fig. 9. Uncoded and coded BER performance of MMSE-LAS detector in 
a 600 X 600 V-BLAST system for 16-PAM and 16-QAM with rate- 1/2 and 
rate-1/3 turbo codes. 



High spectral efficiencies with large n can be achieved using 
this code construction. For example, with n = 16 transmit 
antennas, the 16 x 16 STBC from (20.a) with 16-QAM and 
rate-3/4 turbo code achieves a spectral efficiency of 48 bps/Hz. 
This high spectral efficiency is achieved along with the full- 
diversity of order nN r . 

However, since these STBCs are non-orthogonal, ML detec- 
tion gets increasingly impractical for large number of transmit 
antennas, n. Consequently, a key challenge in realizing the 
benefits of these full-rate non-orthogonal STBCs in practice 
is that of achieving near-ML performance for large number 
of transmit antennas at low detection complexities. Here, 
we show that near-ML detection of large MIMO signals 
originating from several tens of antennas using full-rate non- 
orthogonal STBCs is possible at practically affordable low 
complexities (using the proposed LAS detector), which is a 
significant new advancement that has not been reported in the 
MIMO detection literature so far. 



200 x 200 V-BLAST system with BPSK, rate- 1/2 turbo code 
and LAS detection, the coded BER degradation compared to 
perfect channel estimation is only 0.2 dB and 0.6 dB for 
channel estimation error variances of 1% and 5%, respectively. 
The investigation of estimation algorithms and efficient pilot 
schemes for accurate channel estimation in large MIMO 
systems as such are important topics for further research. 

IV. Detection of Full-Rate Non-orthogonal 
STBCs 

V-BLAST with large number of antennas can offer high 
spectral efficiencies, but it does not provide transmit diversity. 
On the other hand, well known orthogonal STBCs have 
the advantages of full transmit diversity and low decoding 
complexity, but suffer from rate loss for increased number 
of transmit antennas [2], [26] -[28]. Full-rate non-orthogonal 
STBCs from division algebras (DA) [29], on the other hand, are 
attractive for achieving high spectral efficiencies in addition 
to achieving full transmit diversity, using large number of 
transmit antennas. 

Construction of full-rate non-orthogonal STBCs from DA 
for arbitrary number of transmit antennas n is given by the 
matrix in (20. a) at the bottom of this page [29]. In (20. a), 

J277 I 

uj n = e~a~ , j = v~ L and x u ,v, < u,v < n — lare the 
data symbols from a QAM alphabet. Note that there are n 2 
data symbols in one STBC matrix. When S ~ e^J and 
t = e J , the STBC in (20. a) achieves full transmit diversity 
(under ML decoding) as well as information-losslessness 
[29]. When S = t = 1, the code ceases to be of full-diversity 
(FD), but continues to be information-lossless (ILL) [30]. 



A. Uncoded BER Results for Large STBCs from DA 

We have adopted the proposed LAS detector for the de- 
coding of full-rate non-orthogonal STBCs. In Fig. [10] we 
present the uncoded BER of the LAS detector in decoding 
n x n full-rate non-orthogonal STBCs from DA in (20. a) for 
n = 4, 8, 16, 6 = t = 1, and 4-QAM. It can be observed 
that as the STBC code size n increases, the LAS performs 
increasingly better such that it achieves close to SISO AWGN 
performance (within 0.5 dB at 10~ 3 BER and less) with the 
16 x 16 STBC. We point out that due to the high complexities 
involved in decoding large size STBCs using other known 
detectors, the BER performance of STBCs with large n has 
not been reported in the literature so far. The very fact that 
we could show the simulated BER plots (both uncoded as 
well as turbo coded) for a 16 x 16 STBC with 256 complex 
symbols in one STBC matrix in itself is a clear indication of 
the superior low-complexity attribute of the proposed LAS 
detector. To our knowledge, we are the first to report the 
simulated BER performance of a 16 x 16 STBC from DA; 
this became feasible because of the low-complexity feature of 
the proposed detector. In addition, the achievement of near 
SISO AWGN performance with 16 x 16 STBC is a significant 
result from an implementation view point as well, since 16 
antennas can be easily placed in communication terminals of 
moderate size, which can make large MIMO systems practical. 

B. Turbo Coded BER Results for Large STBCs from DA 

In Fig. QT| we show the coded BER performance of the 
16 x 16 STBC using different turbo code rates of 1/3, 1/2, 
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Fig. 10. Uncoded BER performance of the proposed LAS detector in 
decoding n X n full-rate non-orthogonal STBCs from DA for n = 4,8, 16. 
MMSE initial vector, 4-QAM, N t = N r = n. 16 X 16 STBC with 256 
complex symbols in each STBC matrix achieves close to SISO AWGN 
performance. 



Fig. 1 1 . Coded BER performance of the proposed LAS detector in decoding 
16 X 16 full-rate non-orthogonal STBC from DA. N t = N r = 16. MMSE 
initial vector, 4-QAM. Rates of turbo codes: 1/3, 1/2, 3/4. Proposed LAS 
detector performs close to within about 5.5 dB from the theoretical capacity. 



and 3/4. With 4-QAM, these turbo code rates along with 
the 16 x 16 STBC from DA correspond to spectral efficiencies 
of 10.6 bps/Hz, 16 bps/Hz and 24 bps/Hz, respectively. The 
minimum SNRs required to achieve these capacities are also 
shown in Fig.Qj] It can be observed that the proposed detector 
performs to within about 5.5 dB of the capacity, which is an 
impressive result. In all the turbo coded BER plots in this 
paper, we have used hard decision outputs from the LAS 
algorithm. In [25], we have proposed a method to generate soft 
decision outputs from the LAS algorithm for the individual 
bits that form the QAM/PAM symbols. With the proposed 
soft decision LAS outputs in [25], the coded performance is 
found to move closer to capacity by an additional 1 to 1.5 dB 
compared to that achieved using hard decision LAS outputs 
reported in this paper. 

V. LAS Detector for Multicarrier CDMA 

In this section, we present the proposed LAS detector for 
multicarrier CDMA, its performance and complexity. Consider 
a if -user synchronous multicarrier DS-CDMA system with 
M subcarriers. Let b^ G {+1,-1} denote the binary data 
symbol of the /cth user, which is sent in parallel on M 
subcarriers [33], [34]. Let N denote the number of chips-per- 
bit in the signature waveforms. It is assumed that the channel 
is frequency non-selective on each subcarrier and the fading is 
slow (assumed constant over one bit interval) and independent 
from one subcarrier to the other. _ 

r(i) = 



Let 



(0 (*) 
2/1 2/2 



Vk 



denote the if -length re- 

(i) 

ceived signal vector on the ilh subcarrier; i.e., y k ' is the output 
of the kth user's matched filter on the zth subcarrier. Assuming 
that the inter-carrier interference is negligible, the if -length 
received signal vector on the ith subcarrier y W can be written 
in the form 



R (,) H (,) Ab 



,W 



(20) 



where Ry> is the K x K cross-correlation matrix on the ith 

a) 
subcarrier, with its entries ph S denoting the normalized cross 

correlation coefficient between the signature waveforms of the 

Ith and jth users on the zth subcarrier. Hw represents the 

K x K channel matrix, given by 



H 



W = diag{hf,hf,---M$) 



(21) 



where the channel coefficients h k ' , i — 1, 2, • • • , M, are as- 
sumed to be i.i.d. complex Gaussian r.v's (i.e., fade amplitudes 
are Rayleigh distributed) with zero mean and EUh k j) ] = 

are the real and 
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0.5, where h kI and h k l L 



imaginary parts of h k . The if -length data vector b is given 
by 

b=[ bi b 2 ••• b K ] T , (22) 

and the K x K diagonal amplitude matrix A is given by 

A = diag{A u A 2 ,--- ,A K }, (23) 

where Ak denotes the transmit amplitude of the fcth user. The 
if-length noise vector nW is given by 
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where n k denotes the additive noise component of the /cth 
user on the ith subcarrier, which is assumed to be complex 

= a 2 when j = 



Gaussian with zero mean with E\n k {n? 



k and E\n k 



W/^W^ 



r 2ji) 



] = <J 2 p k ' when j 7^ k. We assume that 



all the channel coefficients are perfectly known at the receiver. 

A. LAS Algorithm for MC-CDMA 

We note that once the likelihood function for the MC- 
CDMA system in the above is obtained, it is straightforward 
to adopt the LAS algorithm for MC-CDMA. Accordingly, in 
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the multicarrier system considered, the most likely b is taken 
as that b which maximizes 
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i=l 



The likelihood function in (125I I can be written in a form similar 
toEqn. (4.11) in [16] as 

A(b) = b T Ay ce// - b T H ce// b, (26) 



where 



M 

y ceff = ^((H«)V W +HW(y«)*), (27) 



1=1 

M 



H ce// = £)AH«R«(H»)*A. 



(28) 



Now observing the similarity between d26i > and (0]i in Sec. 
III-AI the LAS algorithm for MC-CDMA can be arrived at, 
along the same lines as that of V-BLAST in the previous 
section, with y eff , H e// and H rea i replaced by y ce //, H ce// , 
and H-creah respectively, with all other notations, definitions, 
and procedures in the algorithm remaining the same. 

B. Complexity of the Proposed Detector for MC-CDMA 

The complexity of the proposed detector for MC-CDMA 
can be analyzed in a similar manner as done for V-BLAST 
in Sec. [TT] First, given an initial vector, the LAS operation 
part alone in MC-CDMA has an average per-bit complexity 
of 0(MK), which is due to i) initial computation of g(0) in 
(|8), which requires 0(MK) complexity per bit, ii) update of 
g(n) in each step as per (fT8l . which requires O(K) complexity 
for sequential LAS, and hence constant per-bit complexity, and 
in) the average number of steps required to reach a fixed point, 
which, through simulations, is found to have a constant per-bit 
complexity. Next, the initial vector generation using MMSE or 
ZF has a per-bit complexity of 0(K 2 ) for K > M. Finally, 
combining the above complexities involved in the LAS part 
and the initial vector generation part, the overall average per- 
bit complexity of the MMSE/ZF-LAS detector for MC-CDMA 
is 0(K 2 ). The initial vector generation using MF has a per-bit 
complexity of only 0(M). Hence, if the MF output is used as 
the initial vector, then the overall average per-bit complexity 
of the MF-LAS is the same as that of the LAS alone, which 
is 0{MK). For large K, the performance of MF-LAS, ZF- 
LAS, and MMSE-LAS are almost the same (see Fig. [13), and 
hence MF-LAS is preferred because of its linear complexity 
in number of users, K, for a given M, 

C. Results and Discussions for MC-CDMA 

We evaluated the BER performance of the proposed LAS 
detector for MC-CDMA through simulations. We evaluate the 
uncoded BER performance of the proposed LAS detector as 
a function of average SNR, number of users (K), number 
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Fig. 12. BER performance of ZF-LAS and MF-LAS detectors as a function of 
average SNR for single carrier CDMA in Rayleigh fading. M = 1, K = 200, 
N = 300, i.e., a = 2/3. 



of subcarriers (M), and number of chips per bit (N). We 
also evaluate the BER performance as a function of loading 
factor, a, where, as done in the CDMA literature [16], we 

define a — jjw. We call the system as underloaded when 
a < 1, fully loaded when a = 1, and overloaded when 
a > 1. Random binary sequences of length N are used 
as the spreading sequences on each subcarrier. In order to 
make a fair comparison between the performance of MC- 
CDMA systems with different number of subcarriers, we keep 
the system bandwidth the same by keeping MN constant. 
Also, in that case we keep the total transmit power to be the 
same irrespective of the number of subcarriers used. In the 
simulation plots we show in this section, we have assumed that 
all users transmit with equal amplitudqj- The LAS algorithm 
used is the SLAS with circular checking of bits starting from 
the first user's bit. 

First, in Fig. [T_2] we present the BER performance of 
MF/ZF-LAS detectors as a function of average SNR in a single 
carrier (i.e., M = 1) underloaded system, where we consider 
a = 2/3 by taking K = 200 users and N — 300 chips per 
bit. For comparison purposes, we also plot the performance 
of MF and ZF without LAS. Single user (SU) performance, 
which corresponds to the case of no multiuser interference 
(i.e., K = 1), is also shown as a lower bound on the achievable 
multiuser performance. From Fig. [T2] we can observe that the 
performance of MF and ZF detectors are far away from the 
SU performance. Whereas, the ZF-LAS as well as MF-LAS 
detectors almost achieve the SU performance. We point out 
that, like ZF detector, other suboptimum detectors including 
MMSE, SIC, and PIC detectors [16] also do not achieve 
near SU performance for the considered loading factor of 2/3, 
whereas the MF-LAS detector achieves near SU performance, 

7 We note that we have simulated the MF/ZF-LAS performance in near-far 
conditions as well. Even with near-far effect, the MF/ZF-LAS detectors have 
been observed to achieve near single-user performance. 
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Fig. 13. BER performance of ZF-LAS and MF-LAS detectors as a function 
of number of users, K, for single carrier CDMA (M = 1) in Rayleigh fading 
for a fixed a = 2/3 and average SNR = 15 dB. N varied from 15 to 1500. 
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Fig. 14. BER performance of ZF-LAS and MF-LAS detectors as a function 
of average SNR for multicarrier CDMA in Rayleigh fading. M = 1,2,4, 
a = 1, K = 100, MN = 100. 



that too at a lesser complexity than these other suboptimum 
detectors. 

Next, in Fig. Qj] we show the BER performance of the 
MF/ZF-LAS detectors for M = 1 as a function of number 
of users, K, for a fixed value of a — 2/3 at an average 
SNR of 15 dB. We varied K from 10 to 1000 users. SU 
performance is also shown (as the bottom most horizontal 
line) for comparison. It can be seen that, for the fixed value 
of a = 2/3, both the MF-LAS as well as the ZF-LAS achieve 
near SU performance (even in the presence of 1000 users), 
whereas the ZF and MF detectors do not achieve the SU 
performance. 

In Fig. Q31 we show the BER performance of the MF/ZF- 
LAS detectors as a function of average SNR for different 
number of subcarriers, namely, M — 1,2,4, keeping a 
constant MN = 100, for a fully loaded system (i.e., a = 1) 
with K = 100. Keeping a = 1 and K = 100 for all cases 
means that i) N = 100 for M = 1, ii) N = 50 for M = 2, 
and Hi) N = 25 for M = 4. The SU performance for 
M = 1 (1st order diversity), M — 2 (2nd order diversity), and 
M = 4 (4th order diversity) are also plotted for comparison. 
These diversities are essentially due to the frequency diversity 
effect resulting from multicarrier combining of signals from M 
subcarriers. It is interesting to see that even in a fully loaded 
system, the MF/ZF-LAS detectors achieve all the frequency 
diversity possible in the system (i.e., MF/ZF-LAS detectors 
achieve SU performance with 1st, 2nd and 4th order diversities 
for M = 1,2 and 4, respectively). On the other hand, ZF 
detector is unable to achieve the frequency diversity in the fully 
loaded system, and its performance is very poor compared to 
MF/ZF-LAS detectors. 

Next, in Fig. Q3] we present the BER performance of 
ZF/MF-LAS detectors in a MC-CDMA system with M = 4 
as a function of loading factor, a, where we vary a from 
0.025 to 1.5. We realize this variation in a by fixing K = 30, 
M = 4, and varying N from 300 to 5. The average SNR 



K = 30,M = 4 

Average SNR = 8 dB 




Loading Factor, a 



Fig. 15. BER performance of ZF-LAS and MF-LAS detectors as a function 
of loading factor, a, for multicarrier CDMA in Rayleigh fading. M = 4, 
K = 30, N varied from 300 to 5, average SNR = 8 dB. 



considered is 8 dB. From Fig. Q3J it can be observed that as 
a increases all detectors loose performance, but the MF/ZF- 
LAS detectors can offer relatively good performance even at 
overloaded conditions of a > 1. Another observation is that at 
a > 1, MF-LAS performs slightly better than ZF-LAS. This 
is because a > 1 corresponds to a high interference condition, 
and it is known in MUD literature [16] that ZF can perform 
worse than MF at low SNRs and high interference. In such 
cases, starting with a better performing MF output as the initial 
vector, MF-LAS performs better. 

Further to our present work on the application of MF/ZF- 
LAS detectors for MC-CDMA, several extensions are possible 
on the practical application of these detectors in CDMA 
systems. Two such useful extensions are i) MF/ZF/MMSE- 
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LAS for frequency selective CDMA channels with RAKE 
combining; we point out that a similar approach and system 
model adopted here for MC-CDMA is applicable, by taking 
a view of equivalence between frequency diversity through 
MC combining and multipath diversity through RAKE com- 
bining, and ii) MF/ZF/MMSE-LAS for asynchronous CDMA 
systems, which can be carried out once the system model is 
appropriately written in a form similar to ( |20l l. These two 
extensions can allow MF/ZF-LAS detectors to be practical 
in CDMA systems (e.g., 2G and 3G CDMA systems), with 
potential for significant gains in system capacity. Current 
approaches to MUD considered in practical CDMA systems 
appear to be mainly PIC and SIC. However, the illustrated 
fact that MF-LAS can easily outperform PIC/SIC detectors 
both in performance and complexity for large number of users 
suggests that MF-LAS can be a powerful MUD approach in 
practical CDMA systems. 

VI. Conclusions 

We presented a near-capacity achieving, low-complexity 
detector for large MIMO systems having tens to hundreds of 
antennas, and showed its uncoded/coded BER performance 
in the detection of V-BLAST and in the decoding of full- 
rate non-orthogonal STBCs from DA. The proposed detector 
was shown to have excellent attributes in terms of both 
low complexity as well as nearness to theoretical capacity 
performance, achieving high spectral efficiencies of the order 
of tens to hundreds of bps/Hz. To our knowledge, our reporting 
of the decoding of a large full-rate non-orthogonal STBC like 
16 x 16 STBC from DA and its BER/nearness to capacity 
results is for the first time in the literature. We further 
point out that the proposed detector has good potential for 
application in practical MIMO wireless standards, e.g., the 
low-complexity feature of the proposed detector can allow the 
inclusion of 4 x 4, 8 x 8, 16 x 16 non-orthogonal STBCs 
from DA into MIMO wireless standards like IEEE 802.1 In 
and IEEE 802. 16e, which, in turn, can achieve higher spectral 
efficiencies than those are currently possible in these standards. 

We conclude this paper by pointing to the following remark 
made by the author of [2] in its preface in 2005: "It was just 
a few years ago, when I started working at AT&T Labs - 
Research, that many would ask 'who would use more than one 
antenna in a real system?' Today, such skepticism is gone." 
Extending this sentiment, we believe large MIMO systems 
would be practical in the future, and the practical feasibility 
of low-complexity detectors like the one we presented in this 
paper could be a potential trigger to create wide interest in 
the implementation of large MIMO systems. For example, 
antenna/RF technologies and channel estimation for large 
MIMO systems could open up as new focus areas. Potential 
large MIMO applications include inter-base station/base sta- 
tion controller back-haul connectivity using large MIMO links, 
and wireless IPTV. Other interesting large MIMO applications 
can be thought of as well. 
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